Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit

نویسندگان

  • Florian Tramèr
  • Vaggelis Atlidakis
  • Roxana Geambasu
  • Daniel J. Hsu
  • Jean-Pierre Hubaux
  • Mathias Humbert
  • Ari Juels
  • Huang Lin
چکیده

In today’s data-driven world, programmers routinely incorporate user data into complex algorithms, heuristics, and application pipelines. While often beneficial, this practice can have unintended and detrimental consequences, such as the discriminatory effects identified in Staples’ online pricing algorithm and the racially offensive labels recently found in Google’s image tagger. We argue that such effects are bugs that should be tested for and debugged in a manner similar to functionality, performance, and security bugs. We describe FairTest, a testing toolkit that detects unwarranted associations between an algorithm’s outputs (e.g., prices or labels) and user subpopulations, including protected groups (e.g., defined by race or gender). FairTest reports any statistically significant associations to programmers as potential bugs, ranked by their strength and likelihood of being unintentional, rather than necessary effects. We designed FairTest for ease of use by programmers and integrated it into the evaluation framework of SciPy, a popular library for data analytics. We used FairTest experimentally to identify unfair disparate impact, offensive labeling, and disparate rates of algorithmic error in six applications and datasets. As examples, our results reveal subtle biases against older populations in the distribution of error in a real predictive health application, and offensive racial labeling in an image tagger.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ρ Operator: Discovering and Ranking Associations on the Semantic Web

In this paper, we introduce an approach that supports querying for Semantic Associations on the Semantic Web. Semantic Associations capture complex relationships between entities involving sequences of predicates, and sets of predicate sequences that interact in complex ways. Detecting such associations is at the heart of many research and analytical activities that are crucial to applications ...

متن کامل

Nusselt Number Estimation along a Wavy Wall in an Inclined Lid-driven Cavity using Adaptive Neuro-Fuzzy Inference System (ANFIS)

In this study, an adaptive neuro-fuzzy inference system (ANFIS) was developed to determine the Nusselt number (Nu) along a wavy wall in a lid-driven cavity under mixed convection regime. Firstly, the main data set of input/output vectors for training, checking and testing of the ANFIS was prepared based on the numerical results of the lattice Boltzmann method (LBM). Then, the ANFIS was develope...

متن کامل

Designing an approprate solenoid and magnetic field for the HZDR laser-driven beamline

Nowadays, due to the high costs and large dimensions of the conventional proton accelerators, other optimal methods for producing the proton beam have been studied. Using of Laser-driven proton accelerators is one of the important and new methods. In laser-driven ion acceleration, a highly ultra-intense laser pulse interacts with solid density targets and will create a plasma media that will ac...

متن کامل

Long-term Iran's inflation analysis using varying coefficient model

Varying coefficient Models are among the most important tools for discovering the dynamic patterns when a fixed pattern does not fit adequately well on the data, due to existing diverse temporal or local patterns. These models are natural extensions of classical parametric models that have achieved great popularity in data analysis with good interpretability.The high flexibility and interpretab...

متن کامل

Data-Driven Approaches to Improve the Quality of Clinical Processes: A Systematic Review

Background: Considering the emergence of electronic health records and their related technologies, an increasing attention is paid to data driven approaches like machine learning, data mining, and process mining. The aim of this paper was to identify and classify these approaches to enhance the quality of clinical processes. Methods: In order to determine the knowledge related to the research ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1510.02377  شماره 

صفحات  -

تاریخ انتشار 2015